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ABSTRACT 


Parameter  values  of  nonlinear  statistical  models  are  typically  estimated  from  data 
using  Iterative  numerical  procedures.  The  resulting  Joint  sampling  distribution  of  the 
parameter  estimators  Is  often  Intractable,  resulting  In  the  use  of  approximators  or  Monte 
Carlo  simulation  to  determine  properties  of  the  sampling  distribution. 

This  paper  develops  methods,  using  linear  and  quadratic  approximators  as  control 
variates,  that  reduce  the  variance  of  the  Monte  Carlo  estimator  by  orders  of  magnitude. 
Estimation  of  means,  higher  order  raw  moments,  variances,  covariances,  and  percentiles 
Is  considered. 


Keywords:  Control  variates;  Monte  Carlo;  Nonlinear  estimation; 

Simulation;  Swindle;  and  Variance  reduction. 
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1.  INTRODUCTION 


Statistical  models  are  used  In  many  fields  to  relate  dependent  variables  to 
Independent  variables.  For  linear  models  with  additive  Independent  and  normally 
distributed  errors  and  for  a  few  extensions,  the  statistical  theory  Is  well-established  and 
fitting  the  model  to  data  Is  straightforward.  For  nonlinear  models  or  for  models  without 
normally  distributed  errors  the  theory  Is  less  complete  and  Iterative  numerical 
procedures  are  needed  for  fitting,  which  Is  the  price  paid  for  a  better  fitted  model  with 
associated  better  predictions  and  Inference. 

Both  the  precision  of  predictions  and  the  statistical  significance  of  Inferences 
depend  upon  the  sampling  distribution  of  the  estimators  of  the  model  parameters.  This 
typically  multivariate  sampling  distribution  Is  often  summarized  by  statistical  properties 
such  as  means,  higher  order  moments,  variances,  covariances,  percentiles  and  quantiles. 

For  nonlinear  models  these  properties  are  usually  unobtainable  analytically. 
Methods  for  approximating  these  properties  have  followed  three  approaches:  asymptotic 
results,  series  expansions,  and  Monte  Carlo  sampling.  The  usual  advantages  and 
disadvantages  of  these  approaches  hold  here:  Asymptotic  results  are  often  easy  to 
obtain  but  can  have  large  error  for  small  sample  sizes;  approximations  based  on  series 
expansions  have  fixed  accuracy  and  can  be  difficult  to  obtain,  particularly  In 
multivariate  cases;  and  Monte  Carlo  sampling  Is  easy  to  Implement  but  requires  large 
sample  sizes  to  be  accurate. 

In  this  paper  the  efficiency  of  the  Monte  Carlo  sampling  approach  Is  Improved  by 
using  series  expansion  approximations  as  control  variates.  Roughly,  the  Improvement 
results  from  sampling  to  estimate  the  error  of  the  approximation  rather  than  to  estimate 
the  property  of  Interest  directly.  The  resulting  methods  require  few  assumptions  and 
are  straightforward. 

The  theory  developed  here  Is  oriented  toward  providing  the  methodology  for 
efficient  Monte  Carlo  study  of  statistical  models.  In  particular,  the  results  are  oriented 
toward  development  of  a  computer  package  that,  given  only  the  description  of  the 
model  of  Interest,  automatically  and  efficiently  evaluates  the  model.  Such  a 
methodology  Is  Important  since  the  use  of  nonlinear  models  Is  growing,  due  to  advances 
In  both  computer  hardware  and  In  software  for  fitting  nonlinear  models.  But  these 
computer  advances,  while  adequate  for  fitting,  often  are  not  sufficient  for  the  Monte 
Carlo  studies  necessary  to  study  nonlinear  models,  particularly  given  the  Increasing  use 
of  microcomputers  In  Interactive  environments.  In  addition,  automation  Is  Important 
since  most  analysts  lack  the  time  and/or  training  to  develop  variance  reduction  methods 
manually.  Swain  (1082)  and  Swain  and  Schmelser  (1983)  reduce  variance  by  orders  of 
magnitude  for  a  variety  of  models  using  the  variance  reduction  methods  developed  here. 

Section  2  Introduces  notation,  terminology,  and  problem  statements.  Section  3 
surveys  approximation  methods  for  nonlinear  models.  Monte  Carlo  sampling  In  the 
context  of  nonlinear  statistical  models,  with  emphasis  on  control  variates  for  variance 
reduction.  Is  developed  In  Section  4.  Section  5  develops  these  Ideas  for  the  cases  of 
estimating  means,  higher  order  moments,  variances  and  covariances,  and  percentiles. 


2.  NOTATION,  TERMINOLOGY,  AND  PROBLEM  STATEMENTS 

The  following  notation  Is  used.  Random  variables  are  denoted  by  upper-case 
Roman  letters,  upper-case  Greek  letters,  or  lower-case  Greek  letters  augmented  with 
carets  and  bars.  Realizations  of  random  variables  are  denoted  by  lower-case  letters,  as 
are  constants  and  functions.  Variables  are  vectors  or  matrices  with  scalar  components. 
Exceptions  are  specifically  Identified.  Vectors  are  columns,  with  row  vectors  denoted  by 
the  transpose  of  a  vector.  Powers  of  a  vector,  V* ,  denote  componentwise 
exponentiation.  The  kth  moment  about  the  origin  Is  denoted  by  E[V*].  Cov  [V ,W] 
denotes  the  pxq  covariance  matrix  E  ((V-E  [V]XIV-E [ W])r]  of  the  p-  component  random 
vector  V  and  the  q -component  random  vector  W .  The  variance  of  V,  Cov  [  V.  V],  Is  also 
denoted  by  Var  (V). 

Consider  the  nonlinear  q  -dimensional  response  function  defined  over  an  r- 
dlmenslonal  domain, 

B  [W -«T «';#*) 

where  Y(  =  (Yiv  Yia.  •••,>%)  Is  the  measured  response,  X,’  =  (X,\  ,  X/a  ,  •  •  • .  X£)  Is  the 
actual  design  point,  and  9*  —  (6‘.  0a‘,  •  ■  • ,  Of)  Is  the  actual  parameter  value  for  the 
surface.  Y,  has  the  associated  error  random  variable 

E{  =  frlWO 

where  £,  =  Eit,  •  •  • ,  Eit ).  These  errors  arise  from  four  sources:  error  In  controlling 
X{ ,  error  In  measuring  X< ,  error  In  measuring  Y( ,  and  random  variation  In  the  system. 
Reflecting  these  sources  or  error,  the  model  can  be  specified  by  {if* .  0 ',<?*,  H‘}f  where 
G‘  Is  the  multivariate  cdf  associated  with  the  design  matrix  and  H‘  Is  the  multivariate 
cdf  associated  with  the  errors.  There  are  three  design  matrices  of  Interest:  (1)  X',  the 
nominal  design  matrix,  (2)  X‘ ,  the  actual  design  matrix,  and  (3)  X,  the  measured  design 
matrix.  Correspondingly,  G‘  has  three  components:  (1)  G£>,  (2)  G‘. ,  and  (3)  G*.  G‘.  Is 
conditional  on  the  outcome  of  G£>,  and  Gf  Is  conditional  on  the  outcomes  of  both  G£, 
and  G^. .  Similarly,  H‘  Is  the  multivariate  cdf  of  the  two  types  of  error  terms,  E'  and 
Ef,  whose  Individual  cdfe  are  denoted  by  H".  and  H&.  The  error  term  E  Is  the 
convolution  of  the  (possibly  dependent)  error  terms  E‘ ,  the  random  variation  In  the 
system,  and  Ef,  the  error  In  measuring  Y .  H‘.  Is  conditional  on  the  outcome  of  G^. .  H& 
Is  conditional  on  the  outcome  of  Gj  and  H", .  These  relationships  among  G’  and  H' 
are  stated  In  algorithm  form  In  the  discussion  of  the  Monte  Carlo  method  at  the 
beginning  of  Section  4. 

A  surface  n  Is  fitted  by  determining  an  estimate  of  9'  as  a  function  of  observed 
data  (*,  y),  where  x  ==(*,,  *a>  •  •  •  ,  *,  )T  and  *  =(y,.  y9>  ••  •  ,  y.)r,  which  arise  from  the 
actual  model  W .0' .  G' .  H'}.  Since  the  actual  model  Is  unknown,  the  analyst  must 
assume  a  family  of  models,  denoted  here  by  {q,  T,  G ,  H },  and  referred  to  as  the  unfitted 
assumed  model.  G  and  H  are  analogous  to  G  ‘  and  H '  of  the  true  model.  The  feasible 
region  for  9  Is  denoted  by  TCR',  which  arises  from  Inherent  properties  of  o  or  from  the 
context  of  the  application. 

Denote  the  (typically  Iterative)  procedure  used  to  determine  9  by  k, ,  where  «  Is  a 
short  notation  for  « (9;  tj,  G ,  H ,  x ,  * ),  a  scalar  function  of  the  data  and  the  unfitted 


assumed  model.  Such  procedures  almost  always  depend  upon  the  residuals 
e(0)  =  (e,(0).  «a(0),  ■  •  • ,  e.(tf))r,  where  «,(*)=  jr,  -ij (*;  0)  for  0€T.  The  two  most  common 
examples  of  »  are  the  sum  of  squared  residuals  and  the  likelihood  of  the  residuals. 
Procedures  h,  minimize  the  former  and  maximize  the  latter. 

Point  estimation  Is  then 

POINT:  Given  the  unfitted  assumed  model 

{»i.  T ,  G ,  H },  data  (i.y),  and  procedure  A, ,  determine  i. 

In  general,  POINT  Is  a  nonlinear  programming  problem,  with  an  objective  function  * 
and  with  constraints  0  &T,  that  Is  solved  by  procedure  h , .  Whether  or  not  A,  results  In 
the  optimal  nonlinear  programming  solution  Is  Irrelevant.  However,  excluded  from  dis¬ 
cussion  In  this  article  are  procedures  that  can  return  a  different  estimate  on  different 
applications  to  the  same  data  with  the  same  unfitted  assumed  model,  such  as  sometimes 
occurs  when  the  analyst  provides  Initial  values  to  the  procedure. 

Having  solved  POINT ,  the  practitioner  Is  faced  with  the  problem  underlying  this 
paper:  evaluating  the  quality  of  the  estimator  e. 

DIST:  Given  the  assumed  unfitted  model  {»;.  T,  G,H) 
and  procedure  A,  used  In  POINT,  and  the  true  fitted 
model  {?*,  0*.  G\  H'},  determine  properties  of  the  mul¬ 
tivariate  distribution  of  e. 

An  Important  application  of  DIST  Is  to  study  the  effect  of  assuming  the  wrong  model, 
which  occurs  when  one  or  more  of  ij  =  i?*,  G  =  G’ ,  or  H  ==  H*  do  not  hold.  For  exam¬ 
ple,  H  j£H'  when  testing  sensitivity  of  the  sampling  distribution  to  departures  from  the 
assumed  error  distribution  H,  which  Is  typically  Independent,  Identically-distributed, 
zero-mean  normal. 

Unlike  the  linear  case.  In  the  general  nonlinear  case  the  distribution  of  e  depends 
In  a  nontrivial  way  on  0*.  Although  the  researcher  knows  the  value  of  0*,  the  practi¬ 
tioner,  who  cannot  know  0*,  can  only  solve  DIST  for  several  values  of  0*  In  a  region  of 
Interest. 

DIST  Is  a  very  general  problem  —  much  more  general  than  typically  considered  In 
research  on  nonlinear  models.  Monte  Carlo  methods  alone  have  the  ability  to  analyze 
all  special  cases  of  this  general  problem.  However,  the  extensive  literature  of  approxi¬ 
mations  for  subsets  of  this  problem  are  useful  as  the  basis  for  variance  reduction  tech¬ 
niques  for  Monte  Carlo  solutions,  as  developed  In  this  paper. 


3.  APPROXIMATION  METHODS 

Approximations  are  crucial  to  both  the  estimation  problem  POINT  and  the  distri¬ 
bution  problem  DIST  In  nonlinear  models.  The  solution  procedure  A ,  In  POINT  often 
uses  low-order  approximations  as  the  basis  of  Iterating  to  an  Improved  solution.  In 
DIST,  approximations  sometimes  provide  easy  to  compute,  fairly  accurate  approxima¬ 
tions  to  the  distribution  of  ©.  This  section  discusses  a  linear  approximation 


corresponding  to  the  least  squares  estimator  In  the  context  of  both  POINT  and  DIST. 
This  approximation  and  the  analogous  quadratic  approximation  are  good,  but  not  the 
only  possible,  choices  for  use  In  the  control  variate  variance  reduction  techniques 
developed  later  In  this  article. 

Consider  a  procedure  h,  based  on  minimizing  the  (weighted)  sum  of  the  squared 
residuals  for  a  scalar  (q  =1)  response, 

,(0)=*eT(S)  W  t(6), 

where  W  is  a  matrix  of  weights.  When  the  residuals  depend  upon  a  nonlinear  function 
of  the  data  the  normal  equations  require  solution  of  simultaneous  nonlinear  equations. 
One  approach  Is  to  linearize  the  residuals  at  the  current  Iterate,  using 

e  (0)  »  y  V*')  -  F 

where  F(#/))  Is  the  nxp  Jacobian  matrix  of  first  derivatives  d  »?(*,  ,8)/d  0*  (i  « 1,2.  •  •  • .« 
and  k  — 1.2,  •  •  •  ,p )  evaluated  at  0  —  d01.  This  approximation  leads  to  the  Gauss  update 
of  the  Newton  algorithm 

$o-m)  =  j<»  +  w  Ftf*'))- '  Fr(d°'))  W  t (&**)  . 

This  approximation  can  also  be  viewed  as  using  only  the  leading  first-order  terms  In  the 
second  derivatives  (the  square  brackets,  above)  In  the  Newton  update. 

There  Is  little  special  about  the  choice  of  this  linear  approximator  or  about  least 
squares.  When  higher  derivatives  can  be  computed,  higher-order  approximations  are 
sometimes  used  for  greater  accuracy  or  for  numerical  stability.  Series  approximations 
can  also  be  used  to  create  algorithms  to  solve  estimation  problems  with  maximum  likeli¬ 
hood  and  other  loss  functions. 

Now  consider  linear  approximations  applied  to  DIST.  For  example,  consider  a 
weighted  least  squares  estimation  problem  with  q  —  ti’,  a  deterministic  design  matrix, 
and  weight  matrix  W~y  ssVarJE].  In  the  Gauss  update  formula,  replacing  0*  with  001 
yields  the  residuals  I?(0*)  «=  E  and  the  linear  approximator  A 

A  —  0*  +  (Fr(0*)  W  F(0*)]-1  Ft(9‘)  W  E  . 

The  approximator  a  has  readily  determined  properties.  When  E  ~  N(o.W),  then  a  ~ 
N {0‘  ,Var  [A]),  where  Var[A]  ==  [Fr(0*)  W  F{9'))'\  The  distribution  of  a  Is  an  approxima¬ 
tion  of  the  sampling  distribution  of  ©,  although  It  provides  no  Information  about  the 
bias  of  ©  as  an  estimator  of  Gallant  (1975)  uses  such  a  linear  approximator  to  con¬ 
struct  approximators  to  the  likelihood  ratio  test.  The  generalization  for  multiple 
responses  (?  >1)  Is  discussed  In  Bard  (1974). 

Again  this  linear  approximator  and  least  squares  are  Just  examples.  In  the  mul¬ 
tiparameter  quadratic  case  considerable  ingenuity  (and  perhaps  further  approximation) 
must  be  exercised  to  specify  the  approximator  as  a  function  of  the  appropriate  deriva¬ 
tives  and  the  error  vector,  E.  Approximators  are  also  often  based  on  asymptotic  pro¬ 
perties.  Both  the  least  squares  and  maximum  likelihood  estimators  have  asymptotic 
normal  distributions  under  general  conditions  (Chambers,  1977;  Crowder,  1976;  Wilks, 
1982;  and  Wu,  1981).  Shenton  and  Bowman  (1977)  discuss  maximum  likelihood  In 
detail. 

8 


The  linear  approximation  Is  useful  In  both  POINT  and  DIST.  In  POINT ,  the  reali¬ 
zation  S  approximates  d.  In  DIST ,  the  distribution  of  the  random  variable  a  approxi¬ 
mates  the  distribution  of  e.  Both  are  referred  to  as  the  delta  approximation  throughout 
the  rest  of  this  paper.  The  analogous  quadratic  approximation,  with  associated  random 
variable  denoted  by  r  and  realizations  7,  Is  referred  to  as  the  gamma  approximation. 
The  gamma  approximation  for  DIST  Is  based  on  results  for  the  least  squares  estimator  In 
Box  (1971)  and  Clarke  (1980),  who  provide  expressions  for  the  bias  and  variance  matrix. 
The  corresponding  quadratic  approximation  for  POINT  Is  developed  In  Swain  (1985), 
who  extends  Clarke's  result  to  the  specific  context  of  Monte  Carlo  studies. 

The  accuracy  of  approximations  Is  fixed  by  the  true  model  {»»*,  O’,  G‘,  H‘)  and 
the  unfitted  assumed  model  {g.T.G.H),  although  occasionally  Increased  accuracy  Is 
possible  by  extending  the  order  of  the  approximation.  Measures  of  nonlinearity  (Beale, 
1960;  Bates  and  Watts,  1980)  give  an  Indication  of  the  accuracy  of  approximations,  but 
the  measures  can  be  misleading,  as  when  errors  are  nonnormal  (Glllls  and  Ratkowsky, 
1978). 

4.  MONTE  CARLO  SAMPLING  AND  CONTROL  VARIATES 

Unlike  approximations,  Monte  Carlo  methods  can  provide  any  level  of  accuracy 
with  sufficient  sampling  effort.  The  most  straightforward  Monte  Carlo  simulation 
analysis,  termed  direct  Monte  Carlo,  and  the  Ideas  underlying  reducing  Monte  Carlo 
sampling  error  via  control  variates  are  discussed  In  this  section. 

Consider  the  direct  Monte  Carlo  sampling  analysis  of  DIST:  m  repeated  observa¬ 
tions  or  e,  denoted  by  i , .  «  =  1,2.  •  •  ,m ,  are  generated  and  the  properties  of  the  sam¬ 
pling  distribution  are  estimated  from  the  sample.  In  the  direct  experiment,  quantities 
with  subscript  1 :«  are  scalar  components  of  vector  quantities  denoted  by  subscripts  :« . 
This  direct  experiment  defines  the  class  of  problems  considered  In  this  article.  Corre¬ 
lated  errors  are  not  excluded.  Extensions  such  as  multiplicative  errors  are  straightfor¬ 
ward  but  not  considered. 

The  Direct  Monte  Carlo  Experiment 

1.  For  u  —  1.2,  •  •  •  ,m 

a.  Generate  x.'„  from  Gy, 

b.  Generate  x*  from  G‘.  given  x.,' 

c.  Generate  x;,  from  Gy  given  x;„'x*. 

d.  Generate  e.‘  from  H‘.  given  x;« 

e.  For  t  =  i,2,  •••,«;  calculate  y,',  =  iT(x,*.;  O’)  + 

f.  Generate  from  H£.  given  x.'f  x.' ,  x., ,  and  y*. 

g.  For  «  =  1,2,  ■••.»*;  calculate  yi;,  = 

h.  Solve  POINT  using  h,  on  <x:,,  y:.)  assuming  {17,  T.G.H},  which  leads  to  lt , 
possibly  through  the  residuals  e  (0)  -=  y:.  -  tj(x:w ,  0) 


2.  Compute  estimates  of  Interest  from  d.. ,  «  =  1,2.  •■*,«* 

The  usual  estimates  of  the  marginal  raw  moments  and  the  variance  matrix  of  e  are 

/»*'  =  w"1  E  (*:•)*  *=1.2.  •  •  • 

•  —I 

and 

f;  (#.. 

•  ■k 

respectively. 

While  the  direct  Monte  Carlo  method  Is  conceptually  simple  to  Implement  and 
provides  consistent  estimates  of  the  desired  distribution,  standard  error  decreases  only 
with  m,/a.  Thus  large  Monte  Carlo  sample  sizes  are  needed  to  obtain  precise  estimates 
with  the  direct  experiment.  Worse,  each  observation  Is  expensive,  not  because  of  the 
sampling  from  G'  and  H' ,  but  because  k.  In  step  2  Is  typically  a  nonlinear  program¬ 
ming  algorithm  In  p  variables.  Therefore,  variance  reduction  techniques  are  Important 
when  using  Monte  Carlo  methods  to  analyze  DIST. 

Variance  reduction  techniques  are  almost  as  old  as  Monte  Carlo  sampling;  In  fact 
the  term  Monte  Carlo  was  once  reserved  for  sampling  methods  In  which  some  sort  of 
swindle  was  used  to  reduce  variance.  Consequently,  a  large  literature  exists;  Hartley 
(1077),  Kahn  (1050),  McGrath  and  Irving  (1073),  and  Wilson  (1084)  are  Interesting  sur¬ 
veys.  Examples  of  variance  reduction  In  Monte  Carlo  studies  In  statistical  settings 
Include  Andrews,  Blckel,  Hampel,  Huber,  Rogers,  and  Tukey  (1072),  Arnold,  Bucher, 
Trotter,  and  Tukey  (1056),  Gallant  (1080),  Kleljnen  (1077),  Koehler  (1081),  Relies 
(1074),  and  Schruben  and  Margolin  (1078).  Nelson  (1083)  and  Nelson  and  Schmelser 
(1084a,  1084b,  1085)  propose  a  framework  for  variance  reduction. 

The  variance  reduction  methods  developed  In  the  next  section  use  the  delta  or 
gamma  approximations,  discussed  In  the  last  section,  as  control  variates.  The  remainder 
of  this  section  reviews  control  variates,  drawing  from  Lavenberg  and  Welch  (1081), 
Cheng  (1078),  and  Rubinstein  and  Marcus  (1085).  The  method  of  control  variates  Is 
treated  In  most  Monte  Carlo  texts  (Hammersley  and  Handscomb,  1064)  and  advanced 
simulation  texts  (Kleljnen,  1074;  Fishman,  1078;  Law  and  Kelton,  1082;  Bratley,  Fox, 
and  Schrage,  1083). 

Consider  for  now  a  primary  estimator  Z ,  a  p  random  vector,  and  a  control  variate 
C ,  an  /  random  vector,  whose  mean  Is  known.  The  control  variate  estimator  Is 

Z  (B)  =  Z -B(  C  - E  [C]) 

where  B  Is  a  pxl  weight  matrix.  Z  ( B )  has  the  same  expected  value  as  Z  —  Z  (0)  If  B  Is  a 
constant  or  Is  Independent  of  C .  The  variance  of  Z  (B)  Is  the  matrix 

Var(  Z  (B )]  =  Var  {  Z  ]  +  £Var[  C  )BT  -BCov[C.Z  )- Cor  [Z.C  )  BT 
whose  determinant  Is  minimized  by  the  setting  B  equal  to 

B"  =  Ccv  [  Z  ,C  1  Var  -l  (  C  )  . 
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In  addition  to  minimizing  the  generalized  variance,  |Var[£  (£)]|,  this  set  of  control 
weights  also  minimizes  the  trace  of  the  variance  matrix.  Rubinstein  and  Marcus  (1085) 
show  that 

I  Var  [  Z  (B‘)\  |  -  |  Var  [  Z  (0)]  |  n  (1  ~pf) 

i  —i 

where  the  p are  the  canonical  correlations  between  Z  and  C  (Anderson,  1058). 

External  control  variates  are  random  variables  analogous  to  Z  but  from  another 
system.  This  other  system  Is  simulated  so  as  to  Induce  a  correlation  between  C  and  Z , 
thereby  reducing  the  variance  from  that  of  Z  to  that  of  Z(B).  The  correlation  Is 
Induced  by  subjecting  both  systems  to  the  same  uniform  (o.i)  random  number  streams, 
generating  random  variates  using  the  Inverse  distribution  function,  and  coding  the  simu¬ 
lation  to  provide  a  monotonlc  transformation  between  the  Inputs  and  outputs  —  In  this 
case  between  the  Inputs  x.«, ,  e*.,  e;t"  and  the  output  . 

Often  when  the  control  C  Is  external,  setting  B  =  Ir,  the  p  dimensional  Identity 
matrix.  Is  close  to  optimal  (Hammersley  and  Handscomb,  1064);  but  even  In  the  external 
control  variate  case,  estimating  B‘  Is  sometimes  worthwhile  when  studying  DIST 
(Swain,  1082;  Swain  and  Schmelser,  1084),  especially  when  the  errors  are  nonnormal 
(Swain,  1084).  The  most  common  estimator  Is  the  regression  estimator,  obtained  by 
substituting  sample  values  for  the  covariance  and  variance  terms  In  the  definition  of  B  * . 
Swain  (1082)  studies  a  variety  of  estimators  of  B‘ . 

Estimating  B‘  gives  rise  to  Issues  of  both  bias  and  efficiency.  Bias  can  be  avoided 
by  using  a  splitting  estimator,  as  discussed  In  Kahn  (1056),  Tocher  (1083),  and  Swain 
and  Schmelser  (1083).  Efficiency  depends  upon  the  choice  and  number  of  control  vari¬ 
ates  employed,  as  discussed  by  Lavenberg  and  Welch  (1081),  Porta  Nova  (1085),  Rubin¬ 
stein  and  Marcus  (1085),  and  Venkatraman  and  Wilson  (1085).  The  loss  In  efficiency  for 
each  additional  control  variate  used  when  B  *  Is  estimated  may  or  may  not  be  offset  by 
the  additional  Information  carried  In  the  correlation  between  the  new  component  of  C 
and  Z .  Fortunately,  In  this  sense,  DIST  gives  rise  to  only  a  limited  number  of  natural, 
high-quality  control  variates. 

The  control  variate  technique  Is  not  the  only  simulation  swindle  that  is  possible 
for  DIST,  but  an  advantage  of  control  variates  Is  that  they  can  be  applied  simultane¬ 
ously  to  several  estimators.  In  contrast,  many  other  variance  reduction  techniques  lead 
to  variance  reduction  for  one  estimator  while  leading  to  a  variance  Increase  for  another 
estimator. 


5.  APPLICATION  OF  CONTROL  VARIATES  TO  DIST 

Control  variate  estimators  for  DIST  are  developed  In  this  section  based  on  the 
delta  and  gamma  approximations.  Estimation  of  arbitrary  properties  of  the  distribution 
of  ©  Is  considered.  Means,  higher-order  marginal  moments,  the  variances  and  covari¬ 
ances,  and  percentiles  of  the  sampling  distribution  of  ©  are  given  special  attention.  The 
new  algorithm  Is  Identical  to  the  direct  Monte  Carlo  method  stated  earlled,  with  the 
modification  of  step  2  to  replace  the  direct  estimators  with  the  control  variate  estima¬ 
tors  and  the  addition  of  step  11: 


11.  Compute  S: ,  or  %m  from  ,  y;. 

The  definition  of  &,  and  Tf;.  was  given  In  Section  2.  In  the  general  case,  defined  by  the 
direct  Monte  Carlo  procedure  of  Section  4,  some  care  and  modification  Is  necessary  for 
Implementing  step  11. 

When  rj  ft  ri' ,  care  should  be  taken  to  use  »j  rather  than  *i‘  when  computing  F . 

When  ,  6‘  Is  Inappropriate  In  the  definition  of  a,  since  for  example  the  dimen¬ 
sionality  of  A  may  be  different  than  that  of  B‘ .  In  this  case,  replace  B *  with  d,  the 
estimator  from  the  direct  experiment.  Then  6:m  can  not  be  calculated  until  after  all 
the  direct  sampling  Is  completed.  No  extra  sampling  Is  required. 

When  the  weight  matrix  W  Is  unknown,  replace  W  with  [Var  [£]]■*,  where  the  esti¬ 
mate  Is  again  from  the  direct  sampling  experiment  and  calculation  of 
6.,  ,u  =  x,2,  •  •  •  m ,  Is  delayed  until  later. 

When  •  the  design  matrix  X  Is  not  deterministic,  or  W  Is  unknown  the  sampling 
distribution  of  A  Is  unknown.  In  these  and  other  situations  In  which  the  distribution  of 
A  Is  unknown,  a  supplementary  Monte  Carlo  experiment  Is  run  to  obtain  the  needed 
properties  of  A  to  negligible  error.  The  observations  of  a  are  much  cheaper  to  obtain 
than  observations  of  e,  so  this  supplementary  run  could  obtain  the  negligible  error  with 
a  large  sample  size.  However,  variance  reduction  can  be  applied  to  the  supplementary 
experiment  as  well.  The  most  obvious  method  Is  to  use  conditional  expectations,  which 
means  that  rather  than  calculating  the  properties  of  Interest  from  the  observed  values  of 
a,  the  sample  average  of  the  properties  of  Interest  Is  calculated. 

There  Is  an  option  to  the  supplementary  experiment  —  perturb  the  direct  Monte 
Carlo  observations  to  obtain  a  control  system  for  which  the  properties  of  A  are  known. 
For  example,  set  the  design  matrix  to  the  mean  (or  median)  value,  and  transform  non- 
normal  error  terms  to  normality  using 

=Var  I/;[E1  *"  (H  (€:.)) 

where  *  Is  the  p -dimensional  N(o,I)  distribution  function.  Unfortunately,  setting  the 
design  matrix  to  a  constant  decreases  the  correlation  Induced  between  the  system.  In 
addition,  the  first  author  has  noted  empirically  that  very  nonnormal  errors  (e.g.,  uni¬ 
form)  also  result  In  substantially  less  variance  reduction.  Therefore,  the  supplemental 
experiment  seems  to  be  the  preferred  general  method  for  determining  propoertles  of  A. 

However,  In  many  cases  the  supplemental  run  Is  unnecessary  because  the  proper¬ 
ties  of  a  are  known  analytically.  In  particular,  assume  the  design  matrix  Is  determinis¬ 
tic,  n  —  r}‘,  and  the  errors  are  homogeneous  normal  with  zero  mean.  The  control  variate 
estimators  with  the  values  of  the  known  mean  of  a  are  given  In  the  next  four  sections 
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for  means,  higher  order  marginal  moments,  the  variance  matrix,  and  percentiles. 

5.1  Control  Variate  Estimators  for  the  Mean 

Since  the  estimators  for  nonlinear  models  are  biased,  estimating  the  mean  of  ©  Is 
of  Interest.  The  direct  estimator  for  the  mean  vector,  p'  —  E  [©],  Is 

p  =  m-‘  ©:.  . 

which  Is  unbiased.  The  delta  approximation  sample  mean  Is 

a  =  m _l  2  a:,  , 

•—l 

which  has  mean  9‘ .  The  gamma  approximation  sample  mean  Is 

F  =  m-‘  £)  r:.  , 

•  —  1 

whose  mean  Is  given  by  Box  (1971),  Clarice  (1980),  and  Swain  (1985).  The  general  form 
of  the  control  estimator  for  the  mean  Is 

A'(fl)  =  A-£«7-E[C])  . 

where  C  =  a  or  C  —  r. 

5.2  Control  Variate  Estimators  for  Marginal  Moments 

Control  variates  for  estimating  E  (©*  ]  are  developed  In  this  section  based  on  two 
generalizations  of  the  approach  Just  described  for  estimating  the  mean.  The  first,  the 
approximator  power  control.  Is  based  on  the  sample  average  of  appropriate  powers  of  the 
delta  or  gamma  approximators  using  the  original  parameterization.  The  second,  the 
power  transformation  control,  uses  the  estimators  from  Section  5.1  directly,  except  that 
the  delta  and  gamma  approximations  arise  from  the  new  parameterization  #*)  ss0*  of 
the  same  surface  n- 

In  the  approximator  power  control, 

A(*)  =  m“‘  2  A.*  , 

•  —1 

and 


f(*)  =  tn-'  £  r*  . 

«  — 1 

The  approximator  power  control  variate  estimator  for  the  moments  Is 

fit  {B)  =  ftl  -B  (C-E[C})  . 

where  C  =  a(*  )  or  C  —  T{k).  The  mean  E  [£(*))  Is  provided  In  Appendix  A.  The  mean 
E  (F(i  )1  Is  conceptually  straightforward,  but  not  algebraically  appealing. 


In  the  power  transformation  control, 


i 


and 


r'(A)  =  m-‘  s  I?. (A)  , 

•  — 1 

where 


A'.  (A )  =  ' (A  )  +  [F : 7  (<t> '  (A ))  WF (* ' *  (A  ))]-*  FT  (+’ (k))  W  E.% 


where  F (<Kk ))  =  F (0) 
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Clarke  (1980)  derives  the  more  complicated  expressions  for  r.',(A)  as  well  as  for  any  arbi¬ 
trary  transformation. 

The  equation  for  A'..  (A)  .follows  directly  from  the  chain  rule.  The  matrix  of 
derivatives  Is  the  diagonal  matrix  whose  1/A  diagonal  element  Is  A-1  (0<l"*)  .  so  applica¬ 
tion  of  the  chain  rule  Is  straightforward. 

The  power  transformation  control  variate  estimator  Is 

f4(B)  =  pl  -B  (  G  -E[  C  ])  . 

where  C  —  a  '(A )  or  C  =  r'(A ).  E  [a'(A  ))  —  (O’  )* .  The  mean  for  the  gamma  approxima¬ 
tion  estimator  Is  given  In  Swain  (1985). 

5.3  Control  Estimator  for  the  Variance  Matrix  of  © 

This  section  concentrates  on  control  variates  for  estimating  Var  [©].  Although  vari¬ 
ances  and  covariances  could  be  estimated  by  differences  of  raw  moments,  such  estima¬ 
tors  are  biased  and  have  been  found  (Swain  1982)  to  have  greater  variance  than  the  con¬ 
trol  variate  estimators  developed  here.  The  two  approaches  of  the  last  section  reduce  to 
the  same  method  here,  since  the  random  variable  of  Interest  Is  # l)  =  O'. 

Consider  and  tc ,  the  usual  unbiased  estimators  of  the  variance  matrices  Var  (©) 
and  Var  (C|.  Since  these  matrices  are  symmetric,  let  Sd  and  Sc  be  the  p(p  + 1)/2  element 
vectors  containing  the  lower  triangular  portions  of  Ed  and  tc  stored  rowwise.  Let  S6 
and  Sc  denote  the  direct  Monte  Carlo  estimators.  Then  a  control  variate  estimator  for 
Var  [©]  IS 

$*(B)  =  Ss-B(Sc  -E&?])  . 


i 


* 

l 


l 


Either  Sc  —  or  Sc  =  Sr,  the  sample  estimators  of  Var  [A]  and  Var  (r).  When  the 
delta  approximation  Is  used, 

E|EJ  =  [FT(0’)  W  F(0’)\-lFT(0’)  W  Var  [E]  W  F(0‘)[FT(6’)  W  F(9’)TX  . 

The  mean  of  the  gamma  control  variate  Is  given  In  Swain  (1985). 

For  the  first  time  estimation  of  the  control  weights  B  Is  not  straightforward  via 
regression.  Other  than  using  the  Identity  matrix,  the  other  obvious  option  Is  to  group 
the  m  Monte  Carlo  observations  Into  k  Independent  batches  from  which  an  estimator 
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•"  »*“,  V*  V,  «'  •  ,  *  , 


can  be  formed. 


5.4  Control  Estimation  of  the  Percentiles 

The  direct  Monte  Carlo  estimator  of  P(©  <  r),  where  P(&  <  r)  and  r  are  p- 
component  vectors,  Is 


P(e  <r)  =  nr1  f;  Ue:.)  . 

*  —i 

where  /,(©., )  Is  the  p  -component  Indicator  function  with  elements 


Ir,di) 


1  <  ri 

0  otherwise  . 


The  delta  and  gamma  approximation  control  variates  are  P(A<r)  and  P(T<r),  the 
observed  fractions  of  the  control  samples  less  than  r.  The  Comlsh-Flsher  expansion  can 
be  used  to  approximate  these  probabilities  for  the  gamma  approximation  control.  The 
form  of  the  control  variate  estimator  Is 

p(e<r,  B)  —  P(Q<r)~  B  ( P(C<r)-P(C<T ) 


where  C  —  A  or  C  —  r. 

Indicator  functions  and  control  variate  estimators  are  defined  analogously  for 
estimating  probabilities  of  confidence  Intervals  and  regions. 

Other  control  variate  methods  based  on  the  delta  or  gamma  approximations  are 
possible.  One  method,  suggested  by  Rothery  (1982)  In  a  different  context.  Is  to  con¬ 
struct  a  2X2  contingency  table  on  ©  and  A  falling  above  and  below  their  critical  points  r. 
The  maximum  likelihood  estimator  Is  then  the  control  variate.  The  efficiencies  obtained 
by  this  method  are  comparable  to  the  linear  control  formulation. 

Estimation  of  the  atk  quantile  of  ©,  ©(a)  satisfying  P  (©<©(«))  ==  a,  is  more  difficult 
than  estimation  of  percentiles,  despite  the  close  relationship  between  percentiles  and 
quantiles,  ©(a)  can  be  estimated  using  Interpolation  based  on  percentile  estimates,  order 
statistics,  or  stochastic  approximation.  Control  variate  methods  based  on  these  estima¬ 
tors  can  be  formed  In  the  usual  way,  although  again  the  mean  of  the  control  variable 
must  be  known  or  a  supplemental  experiment  performed.  Some  details  are  given  In 
Swain  (1982). 


APPENDIX:  LINEAR  POWER-CONTROL  EXPECTATIONS 

The  k th  power  of  the  linear  power  control  variate  required  In  Section  5.2  Is 
a*  =[0* +/£]*,  where  componentwise  exponentiation  Is  Indicated,  and 
j  =  (Fr(0*)w'F(0*)rl  Fr( O’)  W  is  the  transformation  for  the  linear  approximation.  The 
\th  component  of  the  vector  a*  Is 

A  *  =  (#,'  +  J(E)k  =  2  ( *•)  (*,•'  y  (Ji  E  )*-'  .•  -l  A  ■■■.?,  (A.1) 

y-o  } 

where  J{  Is  the  \th  row  of  J .  If  7  Is  constant  and  F~N(o,  la3),  then 


E[{J{E)1)  =0. 

E  (M  S  4* 

y-» 

E  [( J,  E  )*]  =  0, 


IVi  E  n  =  <r«  6  £  2  4?J#  Hr  3  £  /,/  , 

y->y-y+i  y-« 


where  4  Is  the  JtA  component  of  J(.  Substitution  Into  equation  (A.1)  yields  for 

«  —  1,2,  •  •  •  ,p 

E  [a,]  =  *,*. 

e(a,?]  =  (#,')» +  ^£4? 

y— 1 

E  [A,*l  =(*')*  + 3*.  v£j</. 

y-» 

and 

E  [A/]  =  <*,*  )*  +  6(tf/  )V  £  4?  +  <r4  8  £  £  JifJif,  +  3  £  4*  . 

y-i  y-iy'-y+»  y-» 
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